In this document, we’ll go through the many many different steps I’ve taken in an attempt to analyze several years of waterbird surveys at Estero La Cruz in Bahía de Kino, Sonora. Rather than only include the final model and results, I started this document when I began working with the data– so this document has evolved and grown alongside me as I worked through the various problems/errors/roadbloacks I encountered. While initally working with the data, I tried to use multiple variables to take advantage of the various data we collect at each survey. However, I’ve learned a lot about modeling and ultimately found myself following the advice to keep it all as simple as possible. By the end of this document, you’ll see that a lot of time and effort has been put into this to address a simple question- Has the number of shorebirds at Estero La Cruz changed over time?
The first steps are loading all required packages and of course—the data. This may seem like a long list of packages, but I’ve only added in the important ones that I know are being applied. Any additional packages that those on this list are dependent upon will also be loaded.
library(dplyr)
library(ggplot2); theme_set(theme_linedraw())
library(ggpubr)
library(ggeffects)
library(gridExtra)
library(stringr)
library(lme4)
library(MASS)
library(stringr)
library(lubridate)
library(tidyr)
library(readr)
library(RColorBrewer)
library(MuMIn)
library(patchwork)
library(pscl)
library(StatisticalModels)
library(numDeriv)
library(multcomp)
library(AICcmodavg)
library(sjPlot)All the necessary files should be saved in the working directory. The files we’ll Counts of birds in each guild for each survey at Estero La Cruz from 2013/14 to present cycle and these are “environment variables” taken at each survey. Similar to the biodiversity
guilds <- read_csv("guilds_final.csv",
col_types = cols(Date = col_date(format = "%m/%d/%y"),
ciclo = col_character(),
season = col_character()))Here’s a sneak peak of the data–as you can see it’s set up with each survey effort as a row with every guild as the first columns. After the guilds, we then have the environmental variables.
| Date | shore | gulls_terns | pel_corm | land | waders | waterfowl | Cloud | Temp | Precip |
|---|---|---|---|---|---|---|---|---|---|
| 2013-09-12 | 867 | 315 | 286 | 12 | 34 | 3 | 5 | 80 | 0 |
| 2013-09-27 | 2129 | 1996 | 363 | 4 | 50 | 1 | 2 | 80 | 0 |
| 2013-10-11 | 793 | 587 | 506 | 6 | 18 | 0 | 0 | 72 | 0 |
| 2013-10-25 | 1243 | 1508 | 459 | 1 | 57 | 0 | 0 | 75 | 0 |
| 2013-11-07 | 2704 | 1473 | 4527 | 6 | 97 | 98 | 0 | 77 | 0 |
For the sake of this document, we’ll fowllow all steps with the shorebirds and then present the results. Then, I’ll show some of the same necessary steps but mostly just present the results from the other guilds.
To model the data, I decided to use mixed models aka GLMMs or GLMERs.
EXLPAIN AND DESCRIBE MODELS BETTEER TO EXPLAIN WHAT THE EFFECTS ARE
random can be intercept or rate of change-
Here we’ll use shorebirds as the example. I started playing with the idea of mixed models because they are a common tool used in biological modeling. Given the data we’re wworking with and question we’re trying to answer, the random effect that needs to be taken into account with this modeling is the observer(s). With several years of data and different people counting birds over the years, that’s probably one of the biggest sources of variation in the data—and hence random effect of the observer(s)!!
This is the observer because we want to calculate and take into account the variance of the observer, but we do not necessarily want to quantify/estimate the impact of each individual observer/group of observers–and for this, it is the random effect.
So knowing that we wanted to use mixed models with count data, we have a few different types of GLMERs that we can run. Below I have selected negative binomial models because it is the best fit that allows the overdispersion in the data. For example, a Poisson model cannot be used because it assumes the variance is equal to the mean. Well in our dataset, the variance is much much greater than the mean (hence overdispersion) so a Poisson model would not be appropriate.
Like I mentioned above, we’ll use shorebirds as the example if the work that I’ve done. And after, I’ll share the results from my modeling of the other guilds.
We’ll start off by making a couple graphs to get an idea of the data we’re working with. Some things to note and keep in mind about the data are two very important things. 1. “temporada” is essentially year. But it is not named “year” because it’s the yearly work cycle/season. That translate to the year “14” is actually fall 2013-summer 2014, etc. 2. “season” is a categorical variable that has two levels: 1=fall/spring and 2=winter.
It’s always good to take a look at the date. For example here you can see that there’s quite a bit of variability in the counts in the years.
So here is my model:
glmernb.shore <- glmer.nb(shore ~ temporada + (1|obs), data=guilds)
GLMEROverdispersion(glmernb.shore) ##test for overdisperion## $residDev
## [1] 91.94035
##
## $residDF
## [1] 110
##
## $ratio
## [1] 0.8358214
##
## $P.ChiSq
## [1] 0.893584
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.062) ( log )
## Formula: shore ~ temporada + (1 | obs)
## Data: guilds
##
## AIC BIC logLik deviance df.resid
## 1955.3 1966.2 -973.6 1947.3 109
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.0262 -0.6600 -0.2132 0.3616 3.3033
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.3017 0.5493
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 10.50299 1.26539 8.300 <2e-16 ***
## temporada -0.18762 0.07606 -2.467 0.0136 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## temporada -0.991
Above I included a test for overdispersion since that was a big concern throughout my initial efforts in trying to model the data. As you can see, the ratio of the deviation to DF (of the residuals) is fairly close to 1. The P-value of the Chi-squared comparing the residual deviance and degrees of freedom to a chi-square distribution is pretty close to 1, which is good. This means we are not dealing with overdispersion.
Above you can also look over the summary of the model. Just from an intial look at this model summary, it looks like there is a significant decline in the abundance of shorebirds over time. It also appears, as expected, that there is a significant increase in the abundance of birds in the winter months in comparison to the rest of the year. Before we saying anything more, let’s do a handful of other steps to check the model and make sure this will be our final model. Stay tuned below first for some model selection work and later some further interpretation.
Let’s plot and take a look at our residuals–always an important step in verifying the model fit. I can tell you right now that these are the best I’ve been able to do thus far.
##put all the imporant things into a data frame
resid_shore<-data.frame(
pearsons_residuals=resid(glmernb.shore,type="pearson"),
deviance_residuals=resid(glmernb.shore,type="deviance"),
response_residuals=guilds$shore-predict(glmernb.shore,type = "response"),
predicted=predict(glmernb.shore,type = "response"),
observed=guilds$shore)
##plot predictions and resids
ggplot(resid_shore,aes(x=predicted,y=pearsons_residuals))+
geom_point() +
ggplot(resid_shore,aes(x=predicted,y=deviance_residuals))+
geom_point()+
ggplot(resid_shore,aes(x=predicted,y=response_residuals))+
geom_point()+
ggplot(resid_shore,aes(x=predicted,y=observed))+ geom_point()Similar to previous models, this isn’t necessarily the best looking set of residual graphs, but for now we can work with it.
Here I’ll show some of the steps I took when trying to select a model. The model above is a pretty good fit for what we’re doing, but there are some other thigns I could include in the model. To make sure that the first model above is indeed the best for our data, we’lll go through a series of steps to gain some confidence. It’s not quite as pretty as using something like the dredge() function from the MuMln package (I couldn’t get it to work with a glmer.nb() model), but as far as I understand model selection, we can do some of the same things but one step at a time.
First we’ll make a null model and compare it to our first simple model that we made that includes
null.shore <- glmer.nb(shore ~ 1 + (1|obs), data=guilds)
anova(null.shore, glmernb.shore) ##compare models in ANOVA table## Data: guilds
## Models:
## null.shore: shore ~ 1 + (1 | obs)
## glmernb.shore: shore ~ temporada + (1 | obs)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## null.shore 3 1958.4 1966.6 -976.19 1952.4
## glmernb.shore 4 1955.3 1966.2 -973.64 1947.3 5.1006 1 0.02392 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there is a significant difference, this means we can reject the null model and accept the simple negative binomial mixed model as a betterr fit to the data. I also feel OK about that given the AICc/BIC are slightly lower, and the log-Liklihood has moved a bit closer to 0 (which is another one of many things to look for).
Now let’s add in some other potential variables that would affect the presence of birds, in this case the season and the tide! Here tide is split into two variables- tide height (low/mid/high) and tide direction (rising/slack/falling).
##add season
glmernb.shore1 <- glmer.nb(shore ~ temporada + season + (1|obs), data=guilds)
##add tide height?
glmernb.shore2 <- glmer.nb(shore ~ temporada + season + tide_height + (1|obs),
data=guilds)
##add tide dir?
glmernb.shore3 <- glmer.nb(shore ~ temporada + season + tide_dir + (1|obs),
data=guilds)
##add both tide variables?
glmernb.shore4 <- glmer.nb(shore ~ temporada + season + tide_height + tide_dir +
(1|obs), data=guilds)We can make a neat model selection table based on information selection criteria. Here we will use the second order AIC (AICc) for comparison. For the model names, I abbreviated the names into “modX” so note that “mod” = “glmernb.shore”, “mod1” = “glmernb.shore1”, “mod2” = “glmernb.shore2”, etc…
shore.cand <- list(null.shore, glmernb.shore, glmernb.shore1, glmernb.shore2,
glmernb.shore3, glmernb.shore4)
mod.names <- list(c("null", "mod", "mod1", "mod2", "mod3","mod4"))
aictab(shore.cand, mod.names) ##this exports our clean table below##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## mod1 5 1928.79 0.00 0.40 0.40 -959.12
## mod3 7 1929.16 0.37 0.33 0.74 -957.05
## mod2 8 1930.45 1.65 0.18 0.91 -956.53
## mod4 10 1931.81 3.02 0.09 1.00 -954.83
## mod 4 1955.65 26.86 0.00 1.00 -973.64
## null 3 1958.60 29.81 0.00 1.00 -976.19
From this, our glmernb.shore1 model seemingly performs the best.
Let’s do another check by comparing increasingly complex models using ANOVAs. Again, if the Chi-squared value is significant, that means that the model is a better fit. We will set up comparisons between our simplest model (glmernb.shore) and each of the increasingly complex models. While we can use the code to set up single comparisons between models (ie. anova(mod1, mod2)) we can also combine it into one line of code such as anova(mod, mod1, mod2, ...) and we will get a table that compares our simplest model (mod1) with the rest of the models.
| npar | AIC | BIC | logLik | deviance | Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|---|---|---|---|---|
| glmernb.shore | 4 | 1955.278 | 1966.188 | -973.6392 | 1947.278 | NA | NA | NA |
| glmernb.shore1 | 5 | 1928.231 | 1941.868 | -959.1156 | 1918.231 | 29.047155 | 1 | 0.0000001 |
| glmernb.shore3 | 7 | 1928.096 | 1947.187 | -957.0479 | 1914.096 | 4.135420 | 2 | 0.1264751 |
| glmernb.shore2 | 8 | 1929.062 | 1950.881 | -956.5311 | 1913.062 | 1.033537 | 1 | 0.3093293 |
| glmernb.shore4 | 10 | 1929.656 | 1956.930 | -954.8280 | 1909.656 | 3.406285 | 2 | 0.1821103 |
Now let’s try it again, but starting with our best model so far, glmernb.shore1.
| npar | AIC | BIC | logLik | deviance | Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|---|---|---|---|---|
| glmernb.shore1 | 5 | 1928.231 | 1941.868 | -959.1156 | 1918.231 | NA | NA | NA |
| glmernb.shore3 | 7 | 1928.096 | 1947.187 | -957.0479 | 1914.096 | 4.135420 | 2 | 0.1264751 |
| glmernb.shore2 | 8 | 1929.062 | 1950.881 | -956.5311 | 1913.062 | 1.033537 | 1 | 0.3093293 |
| glmernb.shore4 | 10 | 1929.656 | 1956.930 | -954.8280 | 1909.656 | 3.406285 | 2 | 0.1821103 |
It looks like our best model was a good choice and we feel good about the glmernb.shore1 model and use it with confidence. That’s because what each of these tables show, is that with adding in the height and/or direction of the tide, we do not get a better fitting model. That means we should stick to our simple model that has the effect of work cycle/year and season, as it is the best fit.
Another potential approach to model selection would be to average the models. But rather than follow more coomplicated steps to do that, I think it’s OK to stick with this process of selecting the best model
Now that we have selected a model, one interesting piece of information about the model is checking the R^2 value. While this measure is commonly used in lms and glms it can be applied to mixed models. Here the function will calculate the R^2 values taking into account only the fixed effects (marginal) as well as all effects (conditionall) using 3 different methods. Here’s the explanataion taken from the help("r.squaredGLMM"):
Marginal R_GLMM² represents the variance explained by the fixed effects, and is defined as: \[R_GLMM(m)² = (σ_f²) / (σ_f² + σ_α² + σ_ε²)\] Conditional R_GLMM² is interpreted as a variance explained by the entire model, including both fixed and random effects, and is calculated according to the equation: \[R_GLMM(c)² = (σ_f² + σ_α²) / (σ_f² + σ_α² + σ_ε²)\] where σ_f² is the variance of the fixed effect components, σ_α² is the variance of the random effects, and σ_ε² is the “observation-level” variance.
So with that we can make a table of these values:
## R2m R2c
## delta 0.2803543 0.4955452
## lognormal 0.3210871 0.5675431
## trigamma 0.2281494 0.4032695
From the this, it looks like our model does a pretty decent job of fitting the data and explaining the variance. Given we’re modeling change over time and not taking into account other environmental variables that are likely affecting the presence/absence/abundance of birds, I feel okay about believing this model.
We’ll run more or less the same code as we used above. See the included .r file for the raw code.
Again, it looks more or less the same as before. Our residuals aren’t perfectly randomly distributed but I’m not sure what I could do at this point to get a better fit. Given the process to get to this model, I think it’s acceptable for the sake of this analysis.
Now it’s time to look at our model predictions. One of the most important aspects of a regression are the estimates of each variable in the model. Since we’re working with negative binomial models, there is a log link meaning the effect estimates need to be log transformed. We can do that easily by using R as a fancy calculator. Before we do that, let’s take a quick look at the different estimates of the model.
The first thing you might notice is that the intercept is very high- yes but it’s not as important. If we think about the data we’re modeling, there are counts of shorebirds in the thousands, and in the first year of data (2013-2014 season), there is a very high degree of variability.
What we really want to pay more attention to are the estimates of two fixed effects of the model. Just by looking at this grarph, the effect of season is a positive increase while that of temporada(which is the year/cycle) is negative.
If you recall from the summary output of the model the estimate of the variable temporada was: -0.2139745. To translate that into normal terms, as the year/cycle changes by 1, the change in shorebird abundance is multipled by a factor of 0.8073726. This can be interpreted into that “the abundance of shorebirds is declining by 19.2627394% each year.” One way we can visualize this is by plotting the predicted values on a graph. We can plot the overall trend of the model. Note that this will back-transform the predicted values to the original scale of our response variable.
However, we know there are differences between the season and that there was a significant effect in the model. From the model estimate of that variable, we can say winter season sees an increase of 298.1387504% in the number of shorebirds in comparison to the rest of the year. Knowing that the difference is so great, let’s take a look at the same graph as above, but grouped by season. Recall that the seasons are split into 1: Fall/Sping and 2: Winter.
Note that both graphs above are the same data, just plotted differently.
In both graphs, the seasons are split into 1 (Fall/Sping–the migration seasons) in purple and 2 (Winter) in blue. These colors will be consistent throughout the rest of this document. This shows that while shorebird abundance is declining in both seasons, it’s clear that the decline is more pronounced in the winter-we can visually see this as the slope of the line is steeper. This could potentially be due a decline in the migratory species overwintering, so it would be a good idea to look further into the migratory species.
Plot the random effects. And also a diagnostic plot: > For generalized linear mixed models, returns the QQ-plot for random effects.
So diagnostic plot is essentially just a qq plot
set_theme(base = theme_linedraw())
plot_model(glmernb.shore1, type="re", colors = "ipsum", vline.color = "red")## $obs
## `geom_smooth()` using formula 'y ~ x'
The first plot shows the random effects. The red line indicates no effect. The blue indicates a positive effect while purple is negtative. The second plot above, as mentioned, is a qqplot.
##show it as connected line
shore_resids <- cbind(resid_shore1, guilds$season)
shore_resids <- cbind(shore_resids, guilds$temporada)
shore_resids$temporada <- shore_resids$`guilds$temporada`
shore_resids$season <- shore_resids$`guilds$season`
ggplot(shore_resids, aes(guilds$temporada, predicted, colour = season)) +
scale_color_manual(values = c("#3f2d54", "#75b8d1")) +
stat_summary(geom="smooth", fun.data=mean_cl_normal, width=0.1, conf.int=0.95) +
stat_summary(geom="line", fun=mean, linetype="dashed") +
stat_summary(geom="point", fun=mean) ## Warning: Ignoring unknown parameters: width, conf.int
Below, I present the results of the other guilds.
Both during my model exploration with shorebirds as well as with other guilds, I ran into a handful of problems. One common problem I ran into was a failure for the model to converge. This convergence failure essentially means that the model fits the data extremely poorly, or the observations just don’t match up. But this isn’t the end of the world. For example, when I tried to apply a negative binomial mixed model to the waterfowl guild data, I ran into this problem. Below is an example of how I handled it.
##now the other guilds. #this outputs a lot of warnings and it isn't goood
glmernb.fowl1 <- glmer.nb(waterfowl ~ temporada + season + (1|obs), data=guilds)## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00631856 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(0.5939) ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
## Data: guilds
##
## AIC BIC logLik deviance df.resid
## 1158.7 1172.4 -574.4 1148.7 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.7671 -0.6636 -0.3612 0.3186 3.0002
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2323 0.482
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.54422 1.55325 6.145 8.01e-10 ***
## temporada -0.36895 0.09658 -3.820 0.000133 ***
## season2 1.29676 0.30908 4.195 2.72e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.987
## season2 0.219 -0.318
Luckily, I found this neat guide that I followed and found helpful. Below I follow the steps of the aformentioned guide and fix the problem. What the steps below basically do is try to optimize the model, starting from the inital fit but increasing the number of iterations.
## [1] 0.4820253
derivs1 <- glmernb.fowl1@optinfo$derivs
sc_grad1 <- with(derivs1, solve(Hessian,gradient))
max(abs(sc_grad1))## [1] 8.6121e-06
## [1] 8.6121e-06
ss <- getME(glmernb.fowl1,c("theta","fixef"))
glmernb.fowl2 <- update(glmernb.fowl1,start=ss,control=glmerControl(optCtrl=list(maxfun=2e4)))No more failure to converge! So let’s check for overdispersion and look at the summary output of the model.
## $residDev
## [1] 81.82569
##
## $residDF
## [1] 109
##
## $ratio
## [1] 0.7506944
##
## $P.ChiSq
## [1] 0.9758231
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(0.5939) ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
## Data: guilds
## Control: glmerControl(optCtrl = list(maxfun = 20000))
##
## AIC BIC logLik deviance df.resid
## 1158.7 1172.4 -574.4 1148.7 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.7671 -0.6636 -0.3613 0.3186 3.0001
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2323 0.482
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.54401 0.55262 17.270 < 2e-16 ***
## temporada -0.36894 0.03602 -10.242 < 2e-16 ***
## season2 1.29669 0.30533 4.247 2.17e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.888
## season2 -0.019 -0.281
It looks like there is a significcant decline in the abundance of waterfowl over time. It also appears, as expected, that there is a significant increase in in the winter months in comparison to the rest of the year.
So now we can follow the steps we took with the shorebirds which means we’ll start by plotting the data.
Recall our model glmernb.fowl2 from directly above.
Similar to before, we can once again calculate the R^2 values. This time we’ll show it in neat table.
| R2m | R2c | |
|---|---|---|
| delta | 0.2931394 | 0.3782012 |
| lognormal | 0.3952562 | 0.5099498 |
| trigamma | 0.1673828 | 0.2159532 |
Once again, it’s looking like we’ve got a pretty decent looking model.
Let’s check the residuals
And now let’s look at the model predctions. Using the estimate of our model, we can say there has been a decline of 30.8533103% over the seasons and an increase of 365.7134803% in the winter.
Similar to the shorebirds, we can see a similar difference in that the decline seems much more stark in the winter.
Check the data
Make the model
glmernb.gulls <- glmer.nb(gulls_terns ~ temporada + season + (1|obs),
data=guilds)
summary(glmernb.gulls)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.1854) ( log )
## Formula: gulls_terns ~ temporada + season + (1 | obs)
## Data: guilds
##
## AIC BIC logLik deviance df.resid
## 1805.0 1818.6 -897.5 1795.0 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.0678 -0.7161 -0.2335 0.6357 4.2453
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.08125 0.285
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.34682 0.98663 7.446 9.59e-14 ***
## temporada -0.04662 0.05946 -0.784 0.43309
## season2 0.57935 0.19179 3.021 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.988
## season2 0.028 -0.105
## $residDev
## [1] 94.99911
##
## $residDF
## [1] 109
##
## $ratio
## [1] 0.8715514
##
## $P.ChiSq
## [1] 0.8280607
Once again, it looks likee there is a significant decline in the abundance of gulls/terems/skimmers over time as well as a significant increease in the winter due to the influx of migrants. From the model estimates, there is a decline of 4.554998% of the in gulls/terns/skimmers guild. While this isn’t as exaggerated as the other guilds, it is still something worth noting.
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.0851377 | 0.1654172 |
| lognormal | 0.1105094 | 0.2147128 |
| trigamma | 0.0590585 | 0.1147470 |
Review the residuals
Plot the model predictions
Here we can visualize that the decline is not as steep as the other guilds, but still present. We can also once again see the notable differences between the winter and non-winter seasons.
Plot the data
Making the model:
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
ss <- getME(glmernb.pel,c("theta","fixef"))
pelmod <- update(glmernb.pel,start=ss,control=glmerControl(optCtrl=list(maxfun=2e4)))
summary(pelmod)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.4891) ( log )
## Formula: pel_corm ~ temporada + season + (1 | obs)
## Data: guilds
## Control: glmerControl(optCtrl = list(maxfun = 20000))
##
## AIC BIC logLik deviance df.resid
## 1433.4 1447.0 -711.7 1423.4 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.1437 -0.6468 -0.3666 0.2948 4.0194
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.4335 0.6584
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.42046 0.20036 42.026 < 2e-16 ***
## temporada -0.22764 0.01202 -18.942 < 2e-16 ***
## season2 0.87525 0.20483 4.273 1.93e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.510
## season2 -0.282 -0.280
## $residDev
## [1] 114.2199
##
## $residDF
## [1] 109
##
## $ratio
## [1] 1.047889
##
## $P.ChiSq
## [1] 0.3471449
Pelicans/cormorants/allies guild is declining by 20.3589084% each year.
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.2305754 | 0.5309768 |
| lognormal | 0.2593109 | 0.5971499 |
| trigamma | 0.1934467 | 0.4454756 |
Plot model predictions Something to note here is the axis. If you look at the boxplots above, you’ll see there are two very high counts (ploted as dots aka outliers) in the 2013-2014 season. Now that can very obviously be skewing the data, but take a second to look at the axis of our graphed model predictions and you’ll see that it’s much smaller.
The only guild that lacks a final model
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00645303 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(2.6765) ( log )
## Formula: waders ~ temporada + season + (1 | obs)
## Data: guilds
##
## AIC BIC logLik deviance df.resid
## 1175.5 1189.1 -582.8 1165.5 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3878 -0.6319 -0.1972 0.4254 3.2289
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2503 0.5003
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.02105 1.05489 5.708 1.14e-08 ***
## temporada -0.11036 0.06322 -1.746 0.0809 .
## season2 -0.06425 0.14351 -0.448 0.6543
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.993
## season2 -0.038 -0.018
## $residDev
## [1] 95.30029
##
## $residDF
## [1] 109
##
## $ratio
## [1] 0.8743146
##
## $P.ChiSq
## [1] 0.822308
As you can see, this model failed to onverge and I haven’t worked on getting a better fit yet
We can put a summary of all the important parts of each model into a single table. Note that here the “estimates” of each variable have been back transformed. For example, in the shorebird model, rather than display the estimate of “-0.21397” for temporada (aka the year), the table will show the back transformed intercept (recall that for a negative binomial model we have to use the exp() function) so the number displayed is the incidence rate ratio of 0.8073726. This simply means that as that variable increases by 1 unit, the response variable is multipled by a factor of 0.8073726. Also note that the R^2 values displayed are calculated using the log-normal method that was shown in the tables above.
tab_model(glmernb.shore1, glmernb.fowl2, glmernb.gulls, pelmod,
dv.labels = c("Shorebirds", "Waterfowl", "Gulls/Terns/Skimmers",
"Pelicans/Cormorants/Allies"),
string.ci = "Conf. Int (95%)",
string.p = "P-Value", pred.labels = c("Intercept", "Year", "Season"))| Shorebirds | Waterfowl | Gulls/Terns/Skimmers | Pelicans/Cormorants/Allies | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictors | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value |
| Intercept | 31436.98 | 2794.32 – 353676.27 | <0.001 | 13960.80 | 4726.26 – 41238.53 | <0.001 | 1551.26 | 224.32 – 10727.61 | <0.001 | 4538.97 | 3064.83 – 6722.15 | <0.001 |
| Year | 0.81 | 0.70 – 0.93 | 0.004 | 0.69 | 0.64 – 0.74 | <0.001 | 0.95 | 0.85 – 1.07 | 0.433 | 0.80 | 0.78 – 0.82 | <0.001 |
| Season | 2.98 | 2.06 – 4.32 | <0.001 | 3.66 | 2.01 – 6.65 | <0.001 | 1.78 | 1.23 – 2.60 | 0.003 | 2.40 | 1.61 – 3.58 | <0.001 |
| Random Effects | ||||||||||||
| σ2 | 0.55 | 0.99 | 0.61 | 0.52 | ||||||||
| τ00 | 0.32 obs | 0.23 obs | 0.08 obs | 0.43 obs | ||||||||
| ICC | 0.36 | 0.19 | 0.12 | 0.46 | ||||||||
| N | 42 obs | 42 obs | 42 obs | 42 obs | ||||||||
| Observations | 113 | 113 | 113 | 113 | ||||||||
| Marginal R2 / Conditional R2 | 0.321 / 0.568 | 0.395 / 0.510 | 0.111 / 0.215 | 0.259 / 0.597 | ||||||||
Another important grouping to make in the bird is between the migrantory and resident species for many reasons. One interesting reason just based off the guild muilds is that when looking at the plotted model predictions, it seemed that in general, the decrease in birds over time was more pronounced in the winter season.
mig <- read_csv("mig_final.csv", col_types = cols(Precip = col_double(),
season = col_character()), na = "0")## Warning: 112 parsing failures.
## row col expected actual file
## 1 Precip a double -- 'mig_final.csv'
## 2 Precip a double -- 'mig_final.csv'
## 3 Precip a double -- 'mig_final.csv'
## 4 Precip a double -- 'mig_final.csv'
## 5 Precip a double -- 'mig_final.csv'
## ... ...... ........ ...... ...............
## See problems(...) for more details.
## [1] "Date" "migrants" "residents" "Cloud" "Temp"
## [6] "Precip" "Wind" "temporada" "szn" "obs"
## [11] "tide_height" "tide_dir" "mes" "mes.yr" "season"
Start with migrants. Plot data:
Make the model
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.9745) ( log )
## Formula: migrants ~ temporada + season + (1 | obs)
## Data: mig
##
## AIC BIC logLik deviance df.resid
## 1807.2 1820.8 -898.6 1797.2 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3566 -0.5788 -0.1004 0.4177 2.9619
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2397 0.4896
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.68772 1.08189 8.030 9.74e-16 ***
## temporada -0.13459 0.06472 -2.080 0.0376 *
## season2 0.78086 0.15731 4.964 6.91e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.992
## season2 -0.050 -0.012
12.5925791% decrease over the years 218.3349139% higher in the winter
Plot the residuals
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.2065679 | 0.4611194 |
| lognormal | 0.2302981 | 0.5140921 |
| trigamma | 0.1782552 | 0.3979173 |
Plot model predictions
First we’ll look at the effects of year on the model, with the other effects considereed constants.
And now compare that graph to the model, but dividing out the season as we
Repeat workflow Plot data:
Make the model
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(2.2867) ( log )
## Formula: residents ~ temporada + season + (1 | obs)
## Data: mig
##
## AIC BIC logLik deviance df.resid
## 1669.8 1683.4 -829.9 1659.8 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3347 -0.6801 -0.2256 0.4741 3.0598
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.3366 0.5802
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.76939 1.19320 5.673 1.40e-08 ***
## temporada -0.05652 0.07160 -0.789 0.43
## season2 0.82429 0.15937 5.172 2.31e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.993
## season2 -0.020 -0.033
Residents, although decreasing it’s not significant. While they aren’t necessarily increasing either, this means that the birds are probably fluctuating year to year but overall are not changing. This is good!! Also, similar to all other groups, the numbers of residents increases 228.0261205% in the winter months.
Plot the residuals
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.1814257 | 0.5366360 |
| lognormal | 0.1970360 | 0.5828095 |
| trigamma | 0.1625532 | 0.4808134 |
Plot model predictions As we can see in the model prediction graphs above- our confidence intervals are very large. And though the trend is a decrease over time, there is not a lot to support that statement. So in looking at these graphs, we can see how much overlap there is in the points and confidence intervals.
Repeating what we did, we can create a model summary table for our two. And once again, what were the intercepts of the model have been back transformed into incidence rate ratios.
tab_model(mig.mod1, res.mod1,
dv.labels = c("Migrants", "Residents"),
string.ci = "Conf. Int (95%)",
string.p = "P-Value", pred.labels = c("Intercept", "Year", "Season"))| Migrants | Residents | |||||
|---|---|---|---|---|---|---|
| Predictors | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value |
| Intercept | 5929.65 | 711.42 – 49423.56 | <0.001 | 870.78 | 84.00 – 9027.43 | <0.001 |
| Year | 0.87 | 0.77 – 0.99 | 0.038 | 0.95 | 0.82 – 1.09 | 0.430 |
| Season | 2.18 | 1.60 – 2.97 | <0.001 | 2.28 | 1.67 – 3.12 | <0.001 |
| Random Effects | ||||||
| σ2 | 0.41 | 0.36 | ||||
| τ00 | 0.24 obs | 0.34 obs | ||||
| ICC | 0.37 | 0.48 | ||||
| N | 42 obs | 42 obs | ||||
| Observations | 113 | 113 | ||||
| Marginal R2 / Conditional R2 | 0.230 / 0.514 | 0.197 / 0.583 | ||||
This is a clean way to summarize some stuff.
Given that most of the guilds seem to be trending downward. Let’s take a look once again at the predicted values of the models for each guild. In each of the models, the starting point was much higher in the winter and seemingly decreased more notably.
One reason I had moved from working with biodiversity indices to working on this modeling project was because it was clear that important aspect of the survey data was missing. For the calculating the biodiversity indices for example, birds had to be identified to species. This completely glosses over the impact of larger aggregations of birds that are identified to a group but not necessarily species (ie estimations of “1000 peeps”). In separting the birds into guilds, this important aspect of data that can be captured.
gm1 <- ggplotGrob(plot(pr.mig1, show.title=FALSE, show.legend=TRUE, color="ipsum"))
gre1 <- ggplotGrob(plot(pr.res, show.title=FALSE, show.legend=TRUE, color="ipsum"))
grid.arrange(gm1, gre1)So in looking at our model, it doesn’t seem as if residents are changing but migrants are. This might suggest that the overall declines across the bird guilds are unequally attributed to declines in migrants overwintering. This could be a big jump to make, but it’s something that seemingly makes sense. BUT ALSO with the separation between migrant/resident spp, I did not include these unidentified birds. Despite this, we still saw some interesting changes and that migrants are following these trends to be declinging.
But finally, a big thing to keep in mind is that there is a lot of variability in all our data over time so all results should be taken with a grain of salt.
While this was fun, the work isn’t done. For example, there are several other cool questions that can be answered. Part of the problem is the surveys lack a consistent protocol. For example, I do now know how people in 2015 measured the tide or temperature. Gathering better data on historic weather/climate conditions would be interesting to look at changes or potential correlations between temp and bird relative abundances. Though we take data on the temperature during each monitoring effort, examining climate from a wider lens would be interesting. Precipitation patterns or extreme temperature events could potentially be affecting the phenology of migration in some birds that may be using temp/precip patterns as environmental cues. While some temperature and precipitation data are available here, I found the data in the area to be incomplete and lacking the last few years. Another interesting direction would be to relate the abundances of shorebirds (and/or other guilds) to tides. While we take data on the tides, it would be possible to more accurately characterize the tide.